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Abstract 

The statistical properties of estimator using covariance matrix for the account of 
point-to-point correlations due to systematic errors are analyzed. It is shown that 
the covariance matrix estimator (CME) is consistent for the realistic cases (when sys- 
tematic errors on the fitted parameters are not extremely large comparing with the 
statistical ones) and its dispersion is always smaller, than the dispersion of the sim- 
plified x 2 estimator applied to the correlated data. The CME bias is negligible for 
the realistic cases if the covariance matrix is calculated during the fit iteratively using 
the parameter estimator itself. Analytical formula for the covariance matrix inversion 
allows to perform fast and precise calculations even for very large data sets. All this 
allows for efficient use of the CME in the global fits. 
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INTRODUCTION 



Modern particle physics development becomes more and more based on the analysis of precise 
experimental data. This demands refining of all stages of the data inference including the 
account of correlations due to systematic uncertainties which are often comparable or even 
larger than the statistical ones. In particular this problem is important for the precise tests 
of Standard Model and determination of the parton distributions [[I], |2j. Many authors for 
the sake of simplicity very often use approaches which ignore point-to-point correlations due 
to systematic errors, i.e. sum all errors in quadrature or drop systematics at all. It is evident 
that if the systematic errors are important source of the data uncertainty such approaches 
can lead to the distortion of the estimated errors on the fitted parameters. At the same 
time the construction of estimators accounting for the correlations is not straightforward 
since the competitive probabilistic model of data can be used in the analysis. Essentially 
two generic models are possible: One based on the frequentist treatment of systematic 
shifts and another one based on the Bayesian approach. This paper is concentrated on the 
analysis of statistical properties of the estimators within the Bayesian treatment of systematic 
errors. An introduction into this scope given in Ref. || contains argumentation in favor of 
this approach. The only point that we would like in particular underline here is that the 
Bayesian treatment is the only constructive way in the case of many sources of systematics 
when classical treatment which implies introduction of additional parameter for every source 
of systematic errors can cause great problem with the interpretation/representation of the 
function of the large number of arguments. 

The natural way to account for point-to-point correlations due to systematic errors within 
Bayesian approach is to use covariance matrix associated with systematic errors (see e.g. 
Refs. 0,0]). Meanwhile, there are concerns that the covariance matrix estimator (CME) can 
result in biased values of the parameters values and their dispersions (see Refs. |6|, [7|, [|, U). 
In this connection it worth to recall that the estimators accounting for the data correlations 
often exhibit poor statistical properties regardless they use covariance matrix or not. For 
example as it was shown in Ref. [|Tt]] the sample dispersion estimated from the the correlated 
Monte Carlo data sets can acquire the bias equal to the dispersion value itselfQ At the same 
time the estimators would be unbiased if the covariance matrix is not evaluated from the 
measurements itself. Indeed, the unbiased estimator for the correlated Monte-Carlo data 



was constructed in Ref. |TlJ using the modeled covariance matrix. 

Running this way, one can hope to construct the unbiased estimators accounting for 
systematic errors through covariance matrix, but to be aware of its unbiasness the study of 
their properties is needed. In view of lack of the comprehensive information on this scope in 
literature, this paper is devoted to the analysis of the statistical properties of such estimators 
with a particular attention paid on the control of the bias. Through the paper the CME 
properties are compared with the properties of the simplest x 2 estimator (SCE) as well as if 



was done earlier in Ref. 12 



1 This effect is connected with the well known fact that the sample dispersion gives biased estimation of 
the studied distribution dispersion; the correlations merely amplify this bias. 
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1 THE SIMPLEST X 2 ESTIMATOR 



To illustrate our method of the statistical properties analysis we start from the analysis of 
uncorrelated measurements. In this case, if the data sample {yi) is supposed to be explicitly 
described by a theoretical model ti = fi(9°), 

yi = U + fiiVi, (1) 

where /i; are independent random variables, Oi are statistical errors, i = 1 . . . N, N is the 
total number of points in the sample. We adopt that theoretical model parameter 9° is 
scalar, the generalization of the formula on the case of vector parameter is evident. If yi 
are obtained in the counting experiment with the large number of events, /ij are Gaussian 
distributed, although it is not crucial for our consideration. As a rule the values of a* given 
in the experimental publications, are the estimators of the yi standard deviations, i.e. are 
random variables, but we neglect their fluctuations. The SCE is based on the minimization 
of functional 

m = £wfL=v£ (2) 



i=i » 

or, equivalently, solution of the equation 

1<9y 2 

«*>-5ir- a (3) 

The solution 9 is the estimator of parameter 9, which is the random variable depending 
on {yi}. To investigate statistical properties of 9 we expand the function around 9° 
and then apply Legendre inversion to obtain the series for 9 (see Ref. |13j for the details of 
method). Introducing 



/ dm 

\ 89 



d9 2 /' 09 \ 89 J 

one can obtain 

X XY bX 



9 ~ 9 u = - + + —r + . . . (4) 



a a 



2a 3 



where < > means averaging over the samples and the rejected part of the expansion 
contains the terms with the higher powers of 1/a and/or X and Y. In this approximation 
the dispersion of 9 is 



a 

and the bias is 

(X) + (XY) , b (X 2 ) 
a a 



For the SCE applied to the sample (JTJ) one can easily obtain 

(A'} = 0, 
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where fl(9) is the derivative on 9. The dispersion and the bias of this estimator are 



(5) 



(6) 



If fi(6) are the linear functions of 9 the series (f|) is truncated and equation can be 
solved exactly. One can see that in this case the estimator bias vanishes. For a non-linear 
data model the expansion (^) contains an infinite number of terms, but the contributions 
from the highest terms are proportional to the powers of D(9) and/or to the central moments 
of Hi higher than the second. These contributions are progressively suppressed comparing 
with the main terms if the data statistics rises. Here and through the paper we neglect the 
contribution from the high moments of j/j. Remind that the same approximation is used 
in deducing of the central limit theorem of statistics. This approach can be also used to 
justify the analysis of a nonlinear data model: The above formula can be applied to the data 
model with a "weak nonlinearity" , i.e. if its nonlinearity is not significant on the scale of the 
parameter standard deviation. 

Now let the sample to have a common additive systematic error. In accordance with the 
Bayesian approach to the treatment of systematic errors the measured values are given by 



Vi = ti + fiidi + \si 



(7) 



where Si are systematic shifts for every point and A is the random variable with zero average 
and unity dispersionP]. Consider the case of one source of systematic error, generalization on 
the many sources case is straightforward. For the sample (^) we loose statistical independence 
of measurements and with the account of the their correlations the relevant expression for 
the dispersion and bias are more complicated 



N 



c, 



( x2 ) = T,^n(W' J (9 ) = -a+ 
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where a and b are given by Eqn. (||), Cij is the covariance matrix for {yi} 



S{Sj ~\~ Sij (Ji (Tj , 



and Sij is Kronecker symbol. Expressions for a and b are the same as for the uncorrelated 
data case. In terms of the iV-component vectors 



Pi = —, 



Wo 



fWo) 



2 Emphasize, that A is not necessary Gaussian distributed. 
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the dispersion and the bias in this case can be expressed as 



D^9) = ^(l+p 2 zl), (9) 



(10) 



/ 2 2^\ 2 

[1 + -p z 1 Jz 12 - p ZiZ 2 

where p, 0i, 02 denote the vectors modulus, z\ is the cosine of angle between p and <f>i, z-i - 
between p and 02, -212 _ between 0i and 02. The dispersion of 8 is larger than for uncorrelated 
data because now it also accounts for the fluctuations due to systematic errors. As to the 
bias it remains zero for the linear model. 
If systematic errors are multiplicative 

Vi = (ti + fi i a i )(l + \r]i), (11) 

where r/j quantify the systematic errors. If both statistical and systematic errors are small 
comparing with £j 

yi^U + Hi(Ji + Xr]iU, 

the correlation matrix is 

Cij = TjiTjjUtj + (\,o-,n ,. (12) 

and the expressions for the bias and dispersion are the same as for the additive systematics 
case after the substitution s, — > rjiU. 

The Eqn. (Q) can be split into the parts which correspond to the statistical and systematic 
fluctuations. One can see that when vectors p and 0i are orthogonal the systematic error on 
8 is equal to zero and the total dispersion is suppressed. Such suppression can be illustrated 
on the example of the extraction of asymmetry from the data with general offset error. Let 
fi(9) = 8xi and both statistical and systematic errors are constant through the sample: 
Si = s, <7i = a. Then pi = s/a, (f)\ = Xi/a and z\ ~ J2 x i- ^ the positive and negative 
values of Xi compensate each other in the measurements, z% = and the systematic error 
vanishes. The appropriate data filtration can also be used to suppress the dispersion (f|). 
To clarify the mechanism of this suppression let us trace the effect of a separate data point 
on the dispersion value. Add to the data set a point with statistical error ao, systematic 
error so and the data model fo(9). If the initial data set is large and the systematic error is 
comparable with statistics, i.e. 

AT>1, P>1, 

0! » P0!^ » ^(8 ), (13) 

O"0 o"o 



the change of Dq{9) after adding the new point is 



AD A (8) w — — 



(14) 



The second term in brackets is always negative and gives the decrease of dispersion due to 
improved statistical precision. At the same time the first term can be negative or positive, 
depending on the signs of Z\ and Sq. Its absolute value can be larger than the absolute value 
of the second term and then Dq(9) can increase or decrease after adding the new point. This 
is manifestation of inconsistency of the SCE applied to the correlated data set. The balance 
between terms of Eqn. (0) is defined by the distribution of fl(8 )/si and cuts of the tails of 
this distribution can decrease the estimator dispersion. 
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2 THE COVARIANCE MATRIX ESTIMATOR 

If systematic error is additive and covariance matrix is known a priori and is given by 
one can use for the parameter estimation the following functional minimization 

N 



(15) 



where E^ is the inverted correlation matrix. This problem can be reduced to the uncorrelated 
case using the linear transformation of the vector {fi(0) — yt} and the estimator is linear for 
the linear data model. Besides, if statistical and systematics fluctuations obey the Gaussian 
distribution, this estimator provides minimal dispersion due to the Cramer-Rao inequality. 

One can easily derive the expressions necessary to calculate the estimator bias and dis- 
persion 

(X) = 0, 

N 

{X 2 ) = -a = J2 fMEqftOo), 



(XY) 



b 
3 



N 



Substituting in the above relations the explicit expression for Eij 

1 



E, 



we obtain the estimator dispersion 



PiPj 



1 



P 1 



1 



1 + 



p 2 z! 



i + p 2 (i 



1 c 



where is the ratio of the total dispersion to the pure statistical one. If p and 
collinear the dispersion of the estimator is 



(16) 



>i are 



which coincide with the SCE dispersion flj^). One can see that if p and <f>i are not collinear 
the SCE dispersion (g) is always larger than the CME dispersion (|l^). This can be readily 
explained qualitatively. For SCE the fitted curve tightly follows the data points and, if these 
points are shifted due to the systematic errors fluctuations, the parameter gains appropriate 
systematic errors. At the same time, since for the CME the information on the data corre- 
lations is explicitly included in x 2 , the correlated fluctuation of the data due to systematic 
shift does not necessary leads to the fitted curve shift and the parameter deviation gets 
smaller than for SCE. The exclusion occurs if z\ = 0, when p and <f>\ are collinear and the 
systematic shift can be perfectly compensated by the change of parameter. If these vectors 
are orthogonal the CME dispersions is 



D 



A,± 
M 
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i.e. it is just the same as the dispersion of SCE applied to the data set without correlations 
(H). Qualitatively it corresponds to the measurements scheme when systematic shift for the 
different points compensate each other, e.g. as in the example considered at the end of Sec. 1. 

For the modern experiments systematic errors are often of the same order as statistical 
ones and if iV ^> 1 then p^> 1. In this limit and if p and are not collinear 



Km « (it) 



and 



D^e) « ^. (is) 

One can see that in the second case the estimator standard deviation rises linearly with the 
increase of the systematics, whereas the CME dispersion saturates. This difference can be 
illustrated on the numerical example inspired by the elastic proton-proton scattering. Let 
us choose 

fi = Uexp(- Vx <\ x t = 0.U, 
where U = 100, V — 10, % — 1 . . . 9. Generating 100 data sets (0) with these fi and 

<Ti = om x l^, Si = - (19) 

we minimized functionals @ and (|15|) varying U and V to obtain their estimators U and 
V. The values of (U — U) 2 and (V — V) 2 for all of the generated data sets were averaged to 
obtain the estimators dispersions. The results on the standard deviation of U for different 
values of k are given in Fig. p] (the picture for V is similar). One can see that at large k the 
CME and the SCE standard deviations differ by factor of 3. 

The example of dispersion suppression observed in the analysis of real experimental data 
can be found in Ref. In this paper we performed the leading order QCD fit to the 




inclusive deep inelastic scattering data of Refs. [15|, [L6[ obtained by the BCDMS collaboration 
in order to determine the parton distribution functions and the strong coupling constant 
value a s . The two different estimators were used and the different estimates were obtained. 
For the SCE the standard deviation of a s (M z ) is 0.015, while for the CME it is 0.007. The 
difference in the gluon distribution bounds for these estimators can is given in Fig. |^. One 
can see that the standard deviation of the gluon distribution for the CME is also about a 
half of the SCE standard deviation. 

If Z\ ^ 1, the change of CME dispersion after adding a new point to the large sample as 
defined by Eqn. (O) is 



AZ$(0) 



1 



_ *1J 



(Tn 



fo( e o) —so 



This change is always negative that proves the CME consistency. Remind, that this is not 
necessary for the SCE (see Sec. 1). The same conclusion can be drawn from the comparison of 
Eqns. (0) and Indeed, the CME dispersion falls with increase of statistical significance 
of the data set (i.e. decrease of a or rise of N) while the SCE dispersion does not. Note, that 
due to consistency of the CME the filtration procedure described in Sec. 1 is not meaningful 
for it. 
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Figure 1: The standard deviations of SCE (circles) and CME (squares) for U at different 
scales of systematic errors k. The lines correspond to the calculation performed with the 
two-dimensional generalization of Eqns. (9,16). 



The CME bias is 



Dm 



Zl2 ~ 



l + p- 



■ZiZ 2 



which vanishes for the linear data model and saturates in the limit of p ^> 1 contrary to the 
SCE. In the numerical example (IS) at k — 0.007 the CME bias is 0.07, whereas the SCE 
bias is 0.13. 

For the multiplicative systematic errors the covariance matrix in unknown a priori and 
one is to calculate it using the parameter estimator. Proceeding this way in the minimization 
of the functional (|T5|) we get 



JV 



1 N 



(20) 



The difference with corresponding expression for the case of additive systematic errors is in 
the second term of Eqn. (p0|). For the linear data model this term is 



i N a2 

- 2 2^ "i3 U V- 2 (l + p2)2 
i,j=l 



[p\zl - 1) - 3p% 2 + 1] , 



where 



ji 



03 is modulus of 03 and z% is the cosine of the angle between 03 and p. The ratio of the 
second term of Eqn. © to the first term a« = £ fl^E^f'^ ) is 



(l + p 



(21) 
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Figure 2: Bounds of gluon distribution obtained from the LO QCD fit to BCDMS data with 
different estimators (the SCE: a; the CME: b). Full lines correspond to the total experimental 
errors, dashed ones - to the statistical only. 



If 6vi ~ 0(1) (that is valid for most real cases), 0(i] 2 )a^ for all values of p, i.e. it 

is suppressed comparing with the first term for small rj. Neglecting as elsewhere the third 
and fourth central moments of {jji}, one can obtain that < X 2 >^ —a and the estimator 
dispersion for multiplicative systematic errors DjJ ps . 

In the case of multiplicative systematics errors Eqn. (|3|) is nonlinear even for the linear 
data model. As a consequence, the expressions responsible for the bias 



1 - 

w = o E E 'if^ 



N 



N N 1 N 

b = 3j2 fW°) E ijf"(e°) + 3 E f^°) E y^°) + o E E '"P^ 



A' 



iV 



TV 



A' 



(22) 



jj=l ij=l ij'=l i,i=l 

do not vanish even if f"(9) is equal to zero. Meanwhile the bias due to the estimator 
nonlinearity is small comparing with the estimator standard deviation. Since 1/-Dm ~< 
X 2 >m —a the bias of estimator with multiplicative systematic errors is 



The first term in the brackets of Eqn. (|23"D is 

(jQ _ _03_P£3_ 



(X) , (X1Q-6/2 
(-a) 3 / 2 



—a 



(23) 



—a 



(24) 



9 



The contribution to the second term in brackets of Eqn. ( [23] ) from fi(^°)^ijfj(^°) * s 
proportional to 



/r " J ( ^ 7 7 7 \ f 3/2 

— Z1Z3 — Z-L3 I £ M 



0! (1+p 2 ) VI + P 2 

and hence it is ~ 0(t]^ 2 ). As one can conclude from Eqns. ( |2~T| , |2~iD the contribution to the 



same term from ^2E",Cij ■ ^ELCy is 0{rf^j^). And finally since 



I N Av$ 
i,j=l 1 

the contribution to Eqn. (p3|) coming from this term is 0(r] 3 ^ 2 ). In summary, for the linear 
data model the estimator bias is a sum of terms 0(t] p ^ m )D^ with p > 1 and q < 3/2. 
Besides, at small p all the four contributions to the bias which survive for the linear data 
model are ~ p while at large p they are ~ 1/p. Summarizing, one can conclude that the 
estimator is negligible excluding the extreme cases with very large £m- 

The explicit estimate of the bias can be obtained from the Eqns. ( |22|J23| ). Meanwhile it 



requires rather lengthy calculations and more simple tool for the bias evaluation is admirable. 
A convenient way for this is to trace the net residual 



N 

R 



1 f> MO) - ih 



Expanding fi(6) near #0 and keeping only the first term in Eqn. @ one obtains for the 
sample (0) 

If the estimator is unbiased, the value of R averaged over the samples is equal to zero. 
Nevertheless the particular values of R may be not equal to zero due to fluctuations. For 
the limited £m the dispersion of R is 

1 N A 

D ^ R ) = W2 E "l_ PlP ' + 0(1/N). (25) 



<j=i V 1 + pf\/ 1 + p] 



N 2 

If the analyzed data come from a single experiment with dominating systematics (i.e with 



p > 1) then D(R) ~ 1. In particular for the BCDMS data of Refs. fL5j, D(R) w 0.7. 
For N exp independent experiments involved in the analysis D(R) ~ 1/N exp . Comparing 
the net residual R with this value allows to get a guess about the estimator bias. More 
definite conclusion can be drawn after the comparison of R with its dispersion calculated 
using Eqn. (|25|) . 

3 PLANNING OF THE COUNTING EXPERIMENTS 

In a particular case when the differential cross section on the variable x is measured, the 
predicted average number of events in the i—th bin of is 

(Ni) = LfiAxi/3i, 
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where L is the integral experiment luminosity, $ is the registration efficiency, and Axi is the 
bin width. Neglecting the fluctuations of Ni the statistical error on the %— th measurement 
is 

„ _ 



LAxiPi 
and 

1 Lf3 
—s = —rAxi. 

The scalar product of the vectors p and cf) is 

N 



(p-$)=Lj2 l j 1 Pi^i 

i=l J 1 

and 



- \f'] 2 N M 2 

t=l " 8=1 " 

For the dense measurements these scalars can be reduced to the integrals over the measure- 
ments region Q: 



and 



L / [f\x)] 2 dx, p 2 = L [s(x)} 2 dx, 
Jn Jn 



where dx = (3(x)/ f(x)dx. The latter expressions can be used in the equations for the 
estimators dispersions^. This approach is convenient for the future experiment optimization 
since it allows for to analyze integrated expression in order to search for the optimal region 
of measurements. For the simple functions f(x), f3(x), and s(x) such analysis sure can be 
performed analytically. 



4 CONCLUSION 

In conclusion, the CME is a convenient tool for the analysis of the data sets with the 
account of correlations due to systematic errors. The CME is consistent for the realistic 
cases (when systematic errors on the fitted parameters are not extremely large comparing 
with the statistical ones) and its dispersion is always smaller, than the dispersion of the x 2 
estimator without account of correlations. The estimator bias is negligible for the realistic 
cases if the covariance matrix is calculated during the fit iteratively using the parameter 
estimator itself. Analytical formula for the covariance matrix inversion allows to perform 
fast and precise calculations even for very large data sets. The latter is especially important 
in view of numerical instabilities occurring in the fits to precise data in the case of large 
correlation between the fitted parameters (see in this connection Ref. ||17|| ). 

A particular attention should be paid on the connection between the estimator dispersion 
and the confidence interval. For a known distribution of the estimator the confidence interval 
can be easily calculated (e.g. it is well known that for the Gaussian distribution one standard 

3 As a result one obtains the Fisher's information for the correlated data case. 
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deviation corresponds to the 67% confidence level). Unfortunately due to the possible non- 
Gaussian nature of the systematic errors one cannot prove that an estimator accounting 
for systematics is Gaussian distributed. However for the large number of systematic errors 
of comparable scale the estimator should obey Gaussian distribution just to the central 
limit theorem of statistics. Otherwise the robust estimates of the confidence intervals, e.g. 
Chebyshev's inequality, should be used. 
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